perm filename STRUCT.AIM[LSP,JRA] blob sn#099702 filedate 1974-04-27 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00010 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002
C00003 00003	.BEGIN TURN ON "←","#"
C00012 00004	Complaints about LISP generally fall into two classes. First,
C00018 00005	The accepted means for specifying static quantities, be they syntax
C00024 00006	.BEGIN NOFILLTURN OFF "{","}"
C00025 00007	.BEGIN FILL
C00027 00008	.BEGIN FILL
C00029 00009	.BEGIN FILL
C00032 00010	.NEXT PAGE
C00033 ENDMK
C⊗;


.GROUP SKIP 20;
.BEGIN CENTERIT;SELECT 2;
←An alternative to Recursive Data Structures
.END

.NEXT PAGE;
.BEGIN TURN ON "←","#";
In a recent AIM, Hoare proposed implementation of  recursive
data structures and corresponding notation sufficient to describe
a type structure on LISP-like languages.

This memo proposes  alternative approaches, without recursive data structures
and with better notation for manipulating the resulting language.
The proposed language is called structured LISP (did I hear trumpets?).

First, there are non-trivial problems involved
in implementing user-defined recursive data structures.
For example, assume we have a type %3INT%* for integers; when we enter a block 
and process a declaration of the form  %3x:INT%* we allocate a space (on the
stack for  the variable %3x%*. If we have a type ARRAY, a declaration of the
form %3INT ARRAY x[n:m]%* will force the current values of %3m%* and %3n%*
to be sampled and an appropriately sized area will be allocated.
What about something like %3x: SEXPR%* where SEXPR is a defined type
corresponding to the LISP Sexpr, i.e. a realization as a binary tree?
How much space is allocated now? The extent of %3x%* is not known; 
and if a scheme analogous to the above is attempted to allocate space
we will surely run out of space. Allocating space for a sexpr requires
allocating space for a sexpr,.....
There are actually %2two%* problems here: 1) recursive types; 2)unions of
types.

****prob of forward ref: xx u.def.ds; sexpr ← struct[....xx]

 v.s pair sepxr ← atom | dptr
dptr ← struct[ sexp  sexp]
    with for refer. to dptr.

Before unveiling the counter proposal, it is necessary to review the
LISP approach to the specification of semantics. It should not be necessary
at this late  date to do so, but fifteen years later people still
do not understand what  LISP is.


There is no confusion about the domain of LISP functions; everyone knows
that the domain is Sexpressions.  Sexpressions are defined by the following BNF
equations.

←<sexpr> ::= <atom> | (<sexpr> . <sexpr>)


The data structures which LISP operates on are then realizations of Sexprs.
The usual realization (as data structures) of  "dotted pairs" and "atomic symbols" 
are binary trees and LISP atoms. There is  distinction worth making
between data structures and sorts or types as used in logic or mathematics.

What %2does%* constitute a data structure definition? 
Normally, when introducing LISP to the uninitiated and unwashed we begin
with dotted-pairs, introducing the LISP primitives on these data structures.
Then we say "Look, dotted-pair notation is too messy  in many of the
domains we wish to represent." A notation for  sequences is introduced
called list notation. List notation is usually glossed over as just an
abbreviation of a longer more complicated dotted expression.
But something more than abbreviation is happening here. To wit, we are
%2really%* introducing a new data structure. Granted, the underlying
representation %2is%* dotted pairs; but we have introduced more.
We have really extended the %3read%* routine to recognize our abbreviation;
and we have extended out %3print%* routine to print list-notation
whenever the internal representation can be "unparsed" to a list.
At least implicitly, we have also extended our primitive functions
and predicates to operate on lists or sequences. Thus  we claim that  
in defining data structures, at least these ammenities must
be catered for.As we will see, other information is necessary or desirable when defining
a new data structure.

Next, LISP syntax is specified by the following BNF equations:

.BEGIN GROUP;NOFILL;

<form>	::= <constant>
	::= <variable>
	::= <function>[<arg>; ... <arg>]
	::= [<form> → <form>; ... <form> → <form>]

etc.
.END


How do we use LISP functions, indeed how do we use %2any%* procedure-oriented
language? Given a problem to be programmed and run on a machine, we encode
the problem domain in the domain of the programming language. In the case of LISP
we encode in Sexprs. Then we represent algorithms which operate on the problem domain
as procedures of the programming language. Again, in the case of LISP, we
would write LISP functions. Then we run the procedure on a machine with specific
data. After execution we must usually re-interpret the output as an answer to our
input-problem.

A typical example in LISP folklore is differentiation of polynomials.
We encode polynomials in Cambridge Polish; notice that differentiation is
recursive and mechanizable, writing a LISP function, %3diff[ exp;x]%* 
which is supposed to operate on %3exp%*: a polynomial and %3x%* a variable
⊗↓actually, %2representations of ..%*, but you knew that.← and produce a
Sexpr output in Cambridge Polish which can be interpreted as the result of
differentiation.

To hand-simulate the execution of such LISP functions, we must have a
reasonably clear intuitive picture of what evaluation of LISP expressions
means. 
Indeed, we can describe informally what the semantics of the language is to be:
this  entails a description of conditional expressions, call-by-value, 
and λ-expressions. A convincing argument can be given that this evaluation process
is mechanical. The above  paradigm can be applied to LISP itself. Certainly the
evaluation is mechanical. How to map LISP expressions to data structures?
That's the  Sexpr form of LISP expressions. %3eval%* operates on such.

So the programming language LISP is simple the data structure representation of
LISP. That is why it is so ugly. However, the resulting power of the %2programming
language%* LISP is a sufficient cosmetic to cover these small warts.

Complaints about LISP generally fall into two classes. First,
Too many ____ing parens: it should be clear how to handle %2this%*
complaint.

A %2justifiable%* complaint is LISP's single minded look at the world of data structures.
Everything is a list, dotted pair, or an atom; and in terms of the underlying 
representation, everything is a dotted pair.
Clearly there are problem domains
which are more naturally or more efficiently modeled by other types 
of data structures. This is one thread leading to structured LISP.

For another thread we look at graphical languages. One of the most
striking features of a graphical language like AMBIT/G is it ability
to express a great deal of information with a small amount of notation.
This ability seems to escape conventional high level languages. The argument
is usually made that this is a feature of  graphical languages. Contemporary
languages, it is argued, are based on mathematical notation which is linear
and was invented to handle domains whose data structuring is weak or nonexistent.
AMBIT advocates put forth another argument which has more substance:

"Indeed, the most important "construction" activity [in software construction
a.i. .etc] seems to be the structuring of data rather than programs....
 ...[in AMBIT/G] Only when his data design is complete does he start programming."
Two points are made here: we want clarity of notation and a stronger role
played by data structures.

Next, how do we wish to define the semantics of the language? By far the most
natural style of definition is that of LISP: map programs onto the data structures
of the language and write a function to act as an interpreter on this
data-structure representation.  As we have seen the %2programming language%*
LISP is content to operate at the level of data structure representation, rather
than at the level of "real" LISP expressions. Well-known benefits accrue
from this style of programming, but there is also a well known technique
for augmenting Sexpression LISP such that %3hoi polloi%* might program in
"real LISP". The cleanest and simplest approach, SDIO of L. Quam, augments the BNF
equations defining "real" LISP with semantic routines, specifying the
mapping to  Sexpr LISP. SDIO  manufactures a parser and an unparser, so that
though programs operate internally on sexprs, the external format is real LISP.
SDIO really creates the concrete syntax#↔#abstract syntax mappings.
This technique is also used by EL1 and is also advocated here.


The final point relates to the importance of data structuring. If we are to rely
on data structures more heavily, then the language features should reflect this.
Arguments to functions should check to mis-matched arguments, and the
programming constructs themselves should be designed to handle specific
kinds of data structures. Also since our programming activity is to rely
more heavily on good data design how are we to specify data structures
precisely?  How are we to allow the use to describe his own data structuring?
Again, the answer to these questions stem from Quam's SDIO.

This is where we enter the discussion of structured LISP.

The accepted means for specifying static quantities, be they syntax
or data structure is BNF. We will first use BNF to begin description of the
language, keeping in mind that we wish also to use BNF for data design.

The external appearance of the language should be LISP-like. 
The semantic specification should also be LISP-like for easy of implementation,
learnability, and clarity of description
But it should allow used defined data structure, and should be rigorously typed.

We thus wish to map structured LISP programs onto data-structures as LISP
does. The class of data structures should be richer.
The clue  here comes from EL1.

Look at slightly modified BNF equations for LISP:

form 	::= var|const|f_app| ...

f_app	::= func [args]

args	::= arg ; args | arg



Thus  BNF equations come in three flavours: RHS with alternatives;
RHS with fixed number of components, and RHS which attempt to describe an
indeterminate length quantity.

We will map these styles BNF equations onto three separate data structures.
Indeterminate length specifications will map to sequences; fixed length RHS
will map to structures; and alternatives will map to pointers.
(We will speak of implementation later, but it should be clear that efficiencies
can be gained by obvious storage conventions for the different data structures.)

.BEGIN NOFILL;GROUP;

For example:

form	::= var
	::= const
	::= f_app

f_app	::= struct[function:fnc;arg_list:args]

args	::= seq[arg]

.END

.BEGIN TURN OFF "{","}";
Notice this is just the abstract syntax specification of Mc Carthy.
As with LISP, it should be clear that an evaluator for the %2data-structure
representation of expressions%* can be described. If we
wished to use { ...} as sequence delimiters and [  ... ] as structure
delimiters we could write data-structure representations of programs just as
we write list representation of LISP programs. No one in his right
mind would wish to do so however.


%3foo[x;y]%* could be represented as: [foo {x y}]

Notice we need no representation for pointers. Pointers can hold
%2one%* of a specified class of structure; when that choice is made, it is 
fixed. Pointers are %2 not%* unions.
.END

So far the only language feature which we have advocated is the typing of the
arguments. What is the purpose of typing? It says that we wish to delimit
the kind of argument which is passed to the function. We strongly advocate
full typing. Any attempts at "coercion" will be severly punished!

Granted that types %2are%* important, it then seems natural to predict
that much of the activity of a program will deal with types. Indeed
it currently appears that a %2very%* large percentage of the normal 
programming activity deals with type checking (recall the AMBIT/G quotes). 

Most functions
will accept %2classes%* of types;  the classes defined by the data definitions
For example, a unary function with "sexpr" type should operate
on either  atoms or dotted pairs.  Such functions are called "generic".
We therefore offer a programming construct named %3generic%* which is used
to dispatch to pieces of program to handle each subcase of the type structure.
The dispatching will use a generalization of the pattern matching scheme
proposed by Hoare and a generalization of GENERIC of EL1.

(As you can see this proposal is a mixture of many ideas, very few of which are new.
E.g.

%3generic%* came from EL1 => BASEL => ELF => ... => Able

pattern-matcher came from Hoare => Pengelley => Burstall => ... => Cain

semantic specification came from EL1 => LISP => God ... => McCarthy

SDIO came from Quam

etc.

)


Clearly it is time for examples:

.BEGIN NOFILL;TURN OFF "{","}";

Here's differentiation

The data definitions (or abstract syntax):

exp	::= var
	::= const
	::= sum
	...

sum	::= struct[sum1:exp;sum2:exp]

the program(most of the syntax is irrelevant):

diff[e:exp; x:var]exp
 generic(e)
	[var]      => {x=e => 1;0}; 
	[const]    => 0; 
	[sum(u,v)] => sum(diff(u,x),diff(v,x));
 ...
 end;

.BEGIN FILL;
The current programming style envisioned is to make data definitions
precede the actual program (a'la AMBIT/G). Dynamic data structures
seem to be counterintuitive.)
.END
.BEGIN FILL;
The type-checker may make assignments of local variables to values of components 
of a data structure: thus sum(u,v).This is like Hoare, which is like ...%c∞%*.
Thus "sum" on LHS of => is a predicate and selector; "sum" on RHS of
=> is a constructor. The test expression may NOT go deeper than examination
and assignment of components of data structures; NEVER look at sub-components.
(structures programming?)
.END

generic may also run on more than one arg. example:

sexpr	::= atom
	::= dptr

dtpr	::= struct[car:sexpr;cdr:sexpr]


equal[x:sexpr;y:sexpr]boolean
 generic(x,y)
	[atom,atom]           => x=y;
	[dtpr(a,b),dtpr(c,d)] => {equal(a,c) => equal(b,d);FALSE};
	FALSE ;
 end

The car function could be:

car[x:dtpr]sexpr
  x.car			        select car-th component of x
				car should not be defined for ATOM !
Notice here we use a named component. Seldom should names be needed,
but if so, they are available. 

atom[x:sexpr]boolean
 generic(x)
	[atom] => TRUE;
	FALSE;
 end;
.BEGIN FILL;
There is another programming construct to handle data definitions on
sequences. "on" runs on sequences: ε matches empty sequence. 
(x⊗y) means x is first and y is rest; 
.END

1. the deadly polyadd

poly	::= seq[term]

term	::= struct[coef:number;expo:expo]

expo	::= seq[num]


polyadd[p:poly;q:poly]poly
 on(p,q)
	[ε] => q
	[*,ε] => p
	[(p1:TERM(a,b)⊗c),p2:TERM(d,e)⊗f)] => {b=e =>{a=-d => polyadd(c,f);
						       poly(term(a+d,b),polyadd(c,f))}
					       b>e => poly(p1,polyadd(c,q))
					       poly(q1,polyadd(p,f))}

 end;

I am not too enchanted with the "on notation", but it suffices.

.BEGIN FILL;
The implementation of such a language should be quite efficient:
"[ ]" does not require an expensive pattern match. It only matches types
and perhaps associates immediate components with variables. Obviously
sequences and structures can be efficiently stored. The calling sequence
is efficient: essentially a dispatch on type. 
.END
.BEGIN FILL;
Here are what appear to be immediate benefits

1. richer data structures than dotted-pairs; user defined d. structures.

2. combination of selectors and predicates into a simple matcher.

3. constructs of language parallel the data typing: generic and on.
Makes for simple and clean programs.

4. excellent readability. The evaluator for language consists of  one page
of data structure definitions and 2 pages of program.

5. efficient data and program

6. formal properties, though unexplored, look promising. Basically this
approach replaces simple data structures and complicated programs, with
complicated data and simple programs. Most programs simply become type-
checkers, dispatching on types to other simple programs. This should
have some benefits in forcing properties which normally have to be proved
about a program, back to be AXIOMS about the chosen data structures.
And relationships involving data structures( static) should be easier
to attack that those involve program( dynamic).


Most of the ideas have been around a long time; what seems novel is their
combination and the extended use of generic. This formualtion is only about
a week old; there maybe holes. But it does seem natural and interesting.
I would be interested in your comments.

			j allen
.END
.END

.NEXT PAGE
←%2Bibliography%*

Hoare	recursive data structures

AMBIT/G	documentation

Quam	SDIO

Wegbreit EL1

Allen	LISP

Burstall structural induction

.END